{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "whjsJasuhstV"
   },
   "source": [
    "<a href=\"https://colab.research.google.com/github/jeffheaton/app_generative_ai/blob/main/t81_559_class_03_1_llm.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "euOZxlIMhstX"
   },
   "source": [
    "# T81-559: Applications of Generative Artificial Intelligence\n",
    "**Module 3: Large Language Models**\n",
    "* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)\n",
    "* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "d4Yov72PhstY"
   },
   "source": [
    "# Module 3 Material\n",
    "\n",
    "* **Part 3.1: Foundation Models** [[Video]](https://www.youtube.com/watch?v=Gb0tk5qq1fA) [[Notebook]](t81_559_class_03_1_llm.ipynb)\n",
    "* Part 3.2: Text Generation [[Video]](https://www.youtube.com/watch?v=lB97Lqt7q58) [[Notebook]](t81_559_class_03_2_text_gen.ipynb)\n",
    "* Part 3.3: Text Summarization [[Video]](https://www.youtube.com/watch?v=3MoIUXE2eEU) [[Notebook]](t81_559_class_03_3_text_summary.ipynb)\n",
    "* Part 3.4: Text Classification [[Video]](https://www.youtube.com/watch?v=2VpOwFIGmA8) [[Notebook]](t81_559_class_03_4_classification.ipynb)\n",
    "* Part 3.5: LLM Writes a Book [[Video]](https://www.youtube.com/watch?v=iU40Rttlb_Q) [[Notebook]](t81_559_class_03_5_book.ipynb)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AcAUP0c3hstY"
   },
   "source": [
    "# Google CoLab Instructions\n",
    "\n",
    "The following code ensures that Google CoLab is running and maps Google Drive if needed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/"
    },
    "id": "xsI496h5hstZ",
    "outputId": "4ead4862-6ccb-44b5-e0f1-2151723f0a79"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Note: using Google CoLab\n",
      "Collecting langchain\n",
      "  Downloading langchain-0.1.16-py3-none-any.whl (817 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m817.7/817.7 kB\u001b[0m \u001b[31m4.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting langchain_openai\n",
      "  Downloading langchain_openai-0.1.4-py3-none-any.whl (33 kB)\n",
      "Requirement already satisfied: PyYAML>=5.3 in /usr/local/lib/python3.10/dist-packages (from langchain) (6.0.1)\n",
      "Requirement already satisfied: SQLAlchemy<3,>=1.4 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.0.29)\n",
      "Requirement already satisfied: aiohttp<4.0.0,>=3.8.3 in /usr/local/lib/python3.10/dist-packages (from langchain) (3.9.5)\n",
      "Requirement already satisfied: async-timeout<5.0.0,>=4.0.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (4.0.3)\n",
      "Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)\n",
      "  Downloading dataclasses_json-0.6.4-py3-none-any.whl (28 kB)\n",
      "Collecting jsonpatch<2.0,>=1.33 (from langchain)\n",
      "  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)\n",
      "Collecting langchain-community<0.1,>=0.0.32 (from langchain)\n",
      "  Downloading langchain_community-0.0.34-py3-none-any.whl (1.9 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.9/1.9 MB\u001b[0m \u001b[31m26.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting langchain-core<0.2.0,>=0.1.42 (from langchain)\n",
      "  Downloading langchain_core-0.1.46-py3-none-any.whl (299 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m299.3/299.3 kB\u001b[0m \u001b[31m11.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting langchain-text-splitters<0.1,>=0.0.1 (from langchain)\n",
      "  Downloading langchain_text_splitters-0.0.1-py3-none-any.whl (21 kB)\n",
      "Collecting langsmith<0.2.0,>=0.1.17 (from langchain)\n",
      "  Downloading langsmith-0.1.51-py3-none-any.whl (115 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m116.0/116.0 kB\u001b[0m \u001b[31m5.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: numpy<2,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (1.25.2)\n",
      "Requirement already satisfied: pydantic<3,>=1 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.7.0)\n",
      "Requirement already satisfied: requests<3,>=2 in /usr/local/lib/python3.10/dist-packages (from langchain) (2.31.0)\n",
      "Requirement already satisfied: tenacity<9.0.0,>=8.1.0 in /usr/local/lib/python3.10/dist-packages (from langchain) (8.2.3)\n",
      "Collecting openai<2.0.0,>=1.10.0 (from langchain_openai)\n",
      "  Downloading openai-1.23.6-py3-none-any.whl (311 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m311.6/311.6 kB\u001b[0m \u001b[31m11.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting tiktoken<1,>=0.5.2 (from langchain_openai)\n",
      "  Downloading tiktoken-0.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.8 MB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.8/1.8 MB\u001b[0m \u001b[31m15.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.3.1)\n",
      "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (23.2.0)\n",
      "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.4.1)\n",
      "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (6.0.5)\n",
      "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp<4.0.0,>=3.8.3->langchain) (1.9.4)\n",
      "Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain)\n",
      "  Downloading marshmallow-3.21.1-py3-none-any.whl (49 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m49.4/49.4 kB\u001b[0m \u001b[31m2.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain)\n",
      "  Downloading typing_inspect-0.9.0-py3-none-any.whl (8.8 kB)\n",
      "Collecting jsonpointer>=1.9 (from jsonpatch<2.0,>=1.33->langchain)\n",
      "  Downloading jsonpointer-2.4-py2.py3-none-any.whl (7.8 kB)\n",
      "Collecting packaging<24.0,>=23.2 (from langchain-core<0.2.0,>=0.1.42->langchain)\n",
      "  Downloading packaging-23.2-py3-none-any.whl (53 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.0/53.0 kB\u001b[0m \u001b[31m4.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting orjson<4.0.0,>=3.9.14 (from langsmith<0.2.0,>=0.1.17->langchain)\n",
      "  Downloading orjson-3.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (141 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m141.1/141.1 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: anyio<5,>=3.5.0 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (3.7.1)\n",
      "Requirement already satisfied: distro<2,>=1.7.0 in /usr/lib/python3/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (1.7.0)\n",
      "Collecting httpx<1,>=0.23.0 (from openai<2.0.0,>=1.10.0->langchain_openai)\n",
      "  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hRequirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (1.3.1)\n",
      "Requirement already satisfied: tqdm>4 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (4.66.2)\n",
      "Requirement already satisfied: typing-extensions<5,>=4.7 in /usr/local/lib/python3.10/dist-packages (from openai<2.0.0,>=1.10.0->langchain_openai) (4.11.0)\n",
      "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1->langchain) (0.6.0)\n",
      "Requirement already satisfied: pydantic-core==2.18.1 in /usr/local/lib/python3.10/dist-packages (from pydantic<3,>=1->langchain) (2.18.1)\n",
      "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (3.3.2)\n",
      "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (3.7)\n",
      "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (2.0.7)\n",
      "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests<3,>=2->langchain) (2024.2.2)\n",
      "Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLAlchemy<3,>=1.4->langchain) (3.0.3)\n",
      "Requirement already satisfied: regex>=2022.1.18 in /usr/local/lib/python3.10/dist-packages (from tiktoken<1,>=0.5.2->langchain_openai) (2023.12.25)\n",
      "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio<5,>=3.5.0->openai<2.0.0,>=1.10.0->langchain_openai) (1.2.1)\n",
      "Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai<2.0.0,>=1.10.0->langchain_openai)\n",
      "  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m7.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai<2.0.0,>=1.10.0->langchain_openai)\n",
      "  Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n",
      "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m4.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
      "\u001b[?25hCollecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain)\n",
      "  Downloading mypy_extensions-1.0.0-py3-none-any.whl (4.7 kB)\n",
      "Installing collected packages: packaging, orjson, mypy-extensions, jsonpointer, h11, typing-inspect, tiktoken, marshmallow, jsonpatch, httpcore, langsmith, httpx, dataclasses-json, openai, langchain-core, langchain-text-splitters, langchain_openai, langchain-community, langchain\n",
      "  Attempting uninstall: packaging\n",
      "    Found existing installation: packaging 24.0\n",
      "    Uninstalling packaging-24.0:\n",
      "      Successfully uninstalled packaging-24.0\n",
      "Successfully installed dataclasses-json-0.6.4 h11-0.14.0 httpcore-1.0.5 httpx-0.27.0 jsonpatch-1.33 jsonpointer-2.4 langchain-0.1.16 langchain-community-0.0.34 langchain-core-0.1.46 langchain-text-splitters-0.0.1 langchain_openai-0.1.4 langsmith-0.1.51 marshmallow-3.21.1 mypy-extensions-1.0.0 openai-1.23.6 orjson-3.10.1 packaging-23.2 tiktoken-0.6.0 typing-inspect-0.9.0\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "try:\n",
    "    from google.colab import drive, userdata\n",
    "    COLAB = True\n",
    "    print(\"Note: using Google CoLab\")\n",
    "except:\n",
    "    print(\"Note: not using Google CoLab\")\n",
    "    COLAB = False\n",
    "\n",
    "# OpenAI Secrets\n",
    "\n",
    "# OpenAI Secrets\n",
    "if COLAB:\n",
    "    os.environ[\"OPENAI_API_KEY\"] = userdata.get('OPENAI_API_KEY')\n",
    "\n",
    "# Install needed libraries in CoLab\n",
    "if COLAB:\n",
    "    !pip install langchain langchain_openai"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "pC9A-LaYhsta"
   },
   "source": [
    "# 3.1: Foundation Models\n",
    "\n",
    "A foundation model for large language models (LLMs) refers to a base model that has been pre-trained on a broad range of data and can be adapted or fine-tuned for specific tasks or applications. These models are called \"foundation\" because they provide a foundational layer of knowledge and capabilities upon which specialized functionalities can be built.\n",
    "\n",
    "Several prominent technology companies and research organizations provide large language models. Notable among them are OpenAI with models like GPT (Generative Pre-trained Transformer), Google with BERT (Bidirectional Encoder Representations from Transformers) and other variants, and Facebook (Meta) which offers models such as RoBERTa (Robustly Optimized BERT Pretraining Approach).\n",
    "\n",
    "Training a large language model from scratch involves significant computational resources and expertise. It requires extensive data collection, cleaning, and processing, along with access to high-powered computing infrastructure capable of handling immense datasets. The financial cost of training such models can run into millions of dollars, making it prohibitive for most individuals and even many organizations. Given these requirements, developing a large language model from scratch is generally beyond the scope of an academic course. Instead, courses may focus on teaching how to use and fine-tune existing models to solve specific problems or conduct research.\n",
    "\n",
    "## How to Evaluate a Foundation Model\n",
    "\n",
    "Evaluating foundation models is crucial to understand their capabilities, limitations, and suitability for specific tasks. Accurate evaluation ensures that the models are safe, reliable, and effective in real-world applications. It also helps in identifying potential biases and errors that could impact their performance.\n",
    "\n",
    "### Open vs. Closed Weights:\n",
    "\n",
    "* **Open weights** refer to models where the trained parameters are publicly accessible. This transparency allows researchers and developers to understand the model's workings, replicate studies, and customize or improve the model further.\n",
    "* **Closed weights** are proprietary models with restricted access to their parameters. These are typically offered by companies as part of commercial products or services, where revealing the weights might compromise business interests or user privacy.\n",
    "\n",
    "### Number of Parameter Weights:\n",
    "\n",
    "The number of parameter weights in a model indicates its capacity or complexity. Parameter weights are typically expressed in millions (M), billions (B), or trillions (T). The higher the number of parameters, the greater the model’s theoretical ability to capture complex patterns and nuances in data. However, this is not a strict indicator of performance across all tasks but rather a general measure of potential.\n",
    "\n",
    "### Pros and Cons of More Weights:\n",
    "\n",
    "**Pros:**\n",
    "* **Increased Learning Capacity:** Models with more parameters can learn more detailed and nuanced representations of data, potentially improving their accuracy and effectiveness across diverse tasks.\n",
    "* **Better Generalization:** Larger models often generalize better to new, unseen datasets, provided they are trained appropriately.\n",
    "\n",
    "**Cons:**\n",
    "* **Computational Cost:** More parameters mean higher computational demands for training and inference, requiring more powerful hardware and longer processing times.\n",
    "* **Risk of Overfitting:** Without proper regularization and training techniques, larger models might overfit the training data, performing well on seen data but poorly on new or varied datasets.\n",
    "* **Environmental Impact:** Training larger models consumes more energy, contributing to larger carbon footprints.\n",
    "\n",
    "### Importance of Context Window Size:\n",
    "The context window size of a model refers to the maximum number of tokens (words or pieces of words) the model can consider at one time when making predictions or generating text. This is crucial because:\n",
    "\n",
    "* **Longer Context:** A larger window size allows the model to consider more information, which can lead to more coherent and contextually appropriate outputs. It's particularly important for tasks involving long documents or complex dependencies between parts of the text.\n",
    "* **Handling Dependencies:** The ability to handle long-range dependencies within the text can dramatically improve performance in tasks like summarization, question answering, and conversation.\n",
    "\n",
    "Understanding these aspects helps in making informed decisions about which model to use for a specific application and how to optimize its performance.\n",
    "\n",
    "The following table provides key statistics for many popular LLMs.\n",
    "\n",
    "### Understanding Tokens\n",
    "\n",
    "In the context of Large Language Models (LLMs) like GPT (Generative Pre-trained Transformer), tokens are the basic units of text that the model processes. A token can be a word, part of a word, or even punctuation. The definition of what exactly constitutes a token depends on the tokenizer used during the training of the model. For example, the word \"smiling\" might be a single token, or it could be split into smaller sub-word units like \"smile\" and \"ing\" depending on the tokenizer's vocabulary and rules.\n",
    "\n",
    "The cost of using a LLM for generating text is often calculated based on the number of tokens processed. This includes both the tokens that make up the input and the tokens generated as output. Because the computational resources required to process each token are significant, especially in models with billions of parameters, the number of tokens directly influences the computational cost. Thus, understanding how to efficiently manage token usage is crucial for optimizing expenses when using LLMs.\n",
    "\n",
    "The context window size of a LLM refers to the maximum number of tokens from the input that the model can consider at one time when making predictions. For example, if a model has a context window of 1,024 tokens, it can only consider the most recent 1,024 tokens of a given input. This limitation affects how much information the model can utilize at once. If the input exceeds this size, the model may not be able to refer back to earlier parts of the text, potentially impacting the coherence and relevance of its responses.\n",
    "\n",
    "The concept of converting tokens into \"pages\" is a useful metaphor for understanding how much content a model can handle or generate. While there's no universal standard for how many tokens equate to a \"page\" of text, a rough approximation is often used based on typical word counts per page in standard documents. For instance, if one page of text typically contains about 500 words, and assuming an average of 1.5 tokens per word, a page would roughly equate to 750 tokens.\n",
    "\n",
    "This approximation can help users gauge how much text they can input or expect in output in more familiar terms, like pages, which can be particularly helpful in educational, professional, or literary contexts where traditional page counts are more commonly used as a measure of content volume.\n",
    "\n",
    "## Common Large Language Models\n",
    "\n",
    "The following table provides key statistics for many popular LLMs. In this course, we will focus primarily on the OpenAI models, which are the first three rows of this table.\n",
    "\n",
    "| Name             | Creator    | Open   | MMLU  | Context Window Tokens |\n",
    "|------------------|------------|--------|-------|-----------------------|\n",
    "| gpt-4o           | OpenAI     | Closed | 88.7  | 128K (93 pages)       |\n",
    "| gpt-4o-mini      | OpenAI     | Closed | 82    | 128K (93 pages)       |\n",
    "| Gemini 1.5 Ultra | Google     | Closed | 90    | 1M (749 pages)\n",
    "| Gemini 1.5 Pro   | Google     | Closed | 85.9  | 1M (749 pages)\n",
    "| Gemini 1.5 Flash | Google     | Closed | 78.9  | ?\n",
    "| Gemini 1.5 Nano  | Google     | Closed | 72.8  | ?\n",
    "| Claude 3 Opus    | Anthropic  | Closed | 88.2  | 200K (146 pages)\n",
    "| Claude 3 Haiku   | Anthropic  | Closed | 76.7  | 200K (146 pages)    \n",
    "| Llama 3.1 405B   | Meta       | Open   | 87.3  | 512K (374 pages)\n",
    "| Llama 3.1 70B    | Meta       | Open   | 80.9  | 256K (187 pages)\n",
    "| Llama 3.1 8B     | Meta       | Open   | 73.0  | 128K (94 pages)\n",
    "| Mistral 7B       | Mistral.AI | Open   | 60.1  | 32K (23 pages)\n",
    "\n",
    "You will see that I estimate the number of pages, assuming 1,000 tokens would correspond to approximately 1.5 pages. This estimate assumes the same average of around 500 words per page and that each token corresponds to about 0.7314 words. The exact number might vary depending on formatting and content, but 1.5 pages per 1,000 tokens is a reasonable estimate.\n",
    "\n",
    "We also use the [MMLU benchmark](https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu) to measure how advanced the model is. Think of it as a sort of IQ of the models, higher is better.\n",
    "\n",
    "In this course, we will make use of the following OpenAI models, summarized here:\n",
    "\n",
    "* **gpt-4o-mini** - The most cost effective model available from OpenAI, the vast majority of the assignments will make use of this model.\n",
    "* **gpt-4o** - A somewhat more expensive model that we will use when we need more in-depth answers than mini provides.\n",
    "\n",
    "These different models have a variety of costs associated with them. I will provide guidelines for the most cost-effective model for each assignment. I present a summary of costs here.\n",
    "\n",
    "| Model        | Input/Output Cost/1M | Notes                       \n",
    "|--------------|----------------------|-----------------------------|\n",
    "| gpt-4o-mini  | $0.15/$0.60          | Use for most coursework     |\n",
    "| gpt-4o       | $5.00/$15.00         | Use for more advanced tasks |\n",
    "\n",
    "As you can see, in this course we will primarily use the **gpt-4o-mini** model to keep costs under control. When longer input/output are needed, by virtue of the context window, we will use the more expensive **gpt-4o**.\n",
    "\n",
    "When you specify a model string, such as \"gpt-3.5-turbo\", you will use the latest GTP 3.5 Turbo model. Specifying a specific model version, such as \"gpt-4-turbo-2024-04-09\" is also possible. For the course, I will always use the latest version, \"got-3.5-turbo,\" and not specify a specific version directly. For the utmost control, you can specify precise versions.\n",
    "\n",
    "## Understanding Temperature\n",
    "\n",
    "\n",
    "OpenAI typically refers to its temperature setting in the context of AI models like ChatGPT. The temperature setting controls the randomness or creativity in the model's responses. A lower temperature (e.g., 0.0) results in more deterministic and predictable outputs, where the model strongly favors the most likely responses. On the other hand, a higher temperature (e.g., 1.0) increases randomness, leading to more varied and sometimes more creative outputs. The exact range can vary depending on the specific application or interface, but typically, it's set between 0 and 1. Adjusting the temperature allows users to tailor the balance between coherence and creativity in the model's responses to better suit specific tasks or preferences.\n",
    "\n",
    "The following code allows you to try out system prompts, temperatures, models, and requests.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "TMF-rtxgRAea"
   },
   "outputs": [],
   "source": [
    "from langchain.chains import ConversationChain\n",
    "from langchain.memory import ConversationBufferWindowMemory\n",
    "from langchain_openai import ChatOpenAI\n",
    "from langchain_core.prompts.chat import PromptTemplate\n",
    "from IPython.display import display_markdown\n",
    "\n",
    "MODEL = 'gpt-4o-mini'\n",
    "TEMPERATURE = 0.25\n",
    "TEMPLATE = \"\"\"The following is a friendly conversation between a human and an\n",
    "AI. Format answers in markdown, and provide truthful and accurate answers.\n",
    "\n",
    "Current conversation:\n",
    "{history}\n",
    "Human: {input}\n",
    "Code Assistant:\"\"\"\n",
    "PROMPT_TEMPLATE = PromptTemplate(input_variables=[\"history\", \"input\"], template=TEMPLATE)\n",
    "\n",
    "def start_conversation():\n",
    "    # Initialize the OpenAI LLM with your API key\n",
    "    llm = ChatOpenAI(\n",
    "        model=MODEL,\n",
    "        temperature=TEMPERATURE,\n",
    "        n=1\n",
    "    )\n",
    "\n",
    "    # Initialize memory and conversation\n",
    "    memory = ConversationBufferWindowMemory()\n",
    "    conversation = ConversationChain(\n",
    "        prompt=PROMPT_TEMPLATE,\n",
    "        llm=llm,\n",
    "        memory=memory,\n",
    "        verbose=False\n",
    "    )\n",
    "\n",
    "    return conversation\n",
    "\n",
    "def query_llm(conversation, prompt):\n",
    "    print(\"Model response:\")\n",
    "    output = conversation.invoke(prompt)\n",
    "    display_markdown(output['response'], raw=True)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 253
    },
    "id": "WUPrAkqDW6GL",
    "outputId": "83cd58dc-b86a-48a0-a867-0ee3d75bd9bf"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model response:\n"
     ]
    },
    {
     "data": {
      "text/markdown": [
       "Sure! Here is a table of the top 5 most populous countries with their population and GDP:\n",
       "\n",
       "| Country       | Population (millions) | GDP (trillion USD) |\n",
       "|---------------|-----------------------|--------------------|\n",
       "| China         | 1,439                 | 14.34              |\n",
       "| India         | 1,380                 | 2.87               |\n",
       "| United States | 331                   | 21.43              |\n",
       "| Indonesia     | 276                   | 1.12               |\n",
       "| Pakistan      | 225                   | 0.30               |"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "conversation = start_conversation()\n",
    "query_llm(conversation, \"\"\"\n",
    "Produce a table the top 5 most populous countries with population and GDP.\n",
    "\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "3TjZs_TRht1n"
   },
   "source": [
    "# Module 3 Assignment\n",
    "\n",
    "You can find the first assignment here: [assignment 3](https://github.com/jeffheaton/app_generative_ai/blob/main/assignments/assignment_yourname_class3.ipynb)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3.11 (genai)",
   "language": "python",
   "name": "genai"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.8"
  },
  "varInspector": {
   "cols": {
    "lenName": 16,
    "lenType": 16,
    "lenVar": 40
   },
   "kernels_config": {
    "python": {
     "delete_cmd_postfix": "",
     "delete_cmd_prefix": "del ",
     "library": "var_list.py",
     "varRefreshCmd": "print(var_dic_list())"
    },
    "r": {
     "delete_cmd_postfix": ") ",
     "delete_cmd_prefix": "rm(",
     "library": "var_list.r",
     "varRefreshCmd": "cat(var_dic_list()) "
    }
   },
   "types_to_exclude": [
    "module",
    "function",
    "builtin_function_or_method",
    "instance",
    "_Feature"
   ],
   "window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
